|

1.

ENIGMA: A Web Application for Running Online Artificial Grammar Learning Experiments.

Chen, Tsung-Ying.

J Psycholinguist Res ; 53(3): 38, 2024 Apr 24.

Article En | MEDLINE | ID: mdl-38656669

Artificial grammar learning (AGL) is an experimental paradigm frequently adopted to investigate the unconscious and conscious learning and application of linguistic knowledge. This paper will introduce ENIGMA ( https://enigma-lang.org ) as a free, flexible, and lightweight Web-based tool for running online AGL experiments. The application is optimized for desktop and mobile devices with a user-friendly interface, which can present visual and aural stimuli and elicit judgment responses with RT measures. Without limits in time and space, ENIGMA could help collect more data from participants with diverse personal and language backgrounds and variable cognitive skills. Such data are essential to explain complex factors influencing learners' performance in AGL experiments and answer various research questions regarding L1/L2 acquisition. The introduction of the core features in ENIGMA is followed by an example study that partially replicated Chen (Lang Acquis 27(3):331-361, 2020) to illustrate possible experimental designs and examine the quality of the collected data.

Learning , Humans , Psycholinguistics , Linguistics , Internet , Language , Multilingualism

2.

Leveraging Large Language Models for Improved Patient Access and Self-Management: Assessor-Blinded Comparison Between Expert- and AI-Generated Content.

Lv, Xiaolei; Zhang, Xiaomeng; Li, Yuan; Ding, Xinxin; Lai, Hongchang; Shi, Junyu.

J Med Internet Res ; 26: e55847, 2024 Apr 25.

Article En | MEDLINE | ID: mdl-38663010

BACKGROUND: While large language models (LLMs) such as ChatGPT and Google Bard have shown significant promise in various fields, their broader impact on enhancing patient health care access and quality, particularly in specialized domains such as oral health, requires comprehensive evaluation. OBJECTIVE: This study aims to assess the effectiveness of Google Bard, ChatGPT-3.5, and ChatGPT-4 in offering recommendations for common oral health issues, benchmarked against responses from human dental experts. METHODS: This comparative analysis used 40 questions derived from patient surveys on prevalent oral diseases, which were executed in a simulated clinical environment. Responses, obtained from both human experts and LLMs, were subject to a blinded evaluation process by experienced dentists and lay users, focusing on readability, appropriateness, harmlessness, comprehensiveness, intent capture, and helpfulness. Additionally, the stability of artificial intelligence responses was also assessed by submitting each question 3 times under consistent conditions. RESULTS: Google Bard excelled in readability but lagged in appropriateness when compared to human experts (mean 8.51, SD 0.37 vs mean 9.60, SD 0.33; P=.03). ChatGPT-3.5 and ChatGPT-4, however, performed comparably with human experts in terms of appropriateness (mean 8.96, SD 0.35 and mean 9.34, SD 0.47, respectively), with ChatGPT-4 demonstrating the highest stability and reliability. Furthermore, all 3 LLMs received superior harmlessness scores comparable to human experts, with lay users finding minimal differences in helpfulness and intent capture between the artificial intelligence models and human responses. CONCLUSIONS: LLMs, particularly ChatGPT-4, show potential in oral health care, providing patient-centric information for enhancing patient education and clinical care. The observed performance variations underscore the need for ongoing refinement and ethical considerations in health care settings. Future research focuses on developing strategies for the safe integration of LLMs in health care settings.

Self-Management , Humans , Self-Management/methods , Artificial Intelligence , Health Services Accessibility , Language , Oral Health

3.

Natural language processing (NLP) to facilitate abstract review in medical research: the application of BioBERT to exploring the 20-year use of NLP in medical research.

Masoumi, Safoora; Amirkhani, Hossein; Sadeghian, Najmeh; Shahraz, Saeid.

Syst Rev ; 13(1): 107, 2024 Apr 15.

Article En | MEDLINE | ID: mdl-38622611

BACKGROUND: Abstract review is a time and labor-consuming step in the systematic and scoping literature review in medicine. Text mining methods, typically natural language processing (NLP), may efficiently replace manual abstract screening. This study applies NLP to a deliberately selected literature review problem, the trend of using NLP in medical research, to demonstrate the performance of this automated abstract review model. METHODS: Scanning PubMed, Embase, PsycINFO, and CINAHL databases, we identified 22,294 with a final selection of 12,817 English abstracts published between 2000 and 2021. We invented a manual classification of medical fields, three variables, i.e., the context of use (COU), text source (TS), and primary research field (PRF). A training dataset was developed after reviewing 485 abstracts. We used a language model called Bidirectional Encoder Representations from Transformers to classify the abstracts. To evaluate the performance of the trained models, we report a micro f1-score and accuracy. RESULTS: The trained models' micro f1-score for classifying abstracts, into three variables were 77.35% for COU, 76.24% for TS, and 85.64% for PRF. The average annual growth rate (AAGR) of the publications was 20.99% between 2000 and 2020 (72.01 articles (95% CI: 56.80-78.30) yearly increase), with 81.76% of the abstracts published between 2010 and 2020. Studies on neoplasms constituted 27.66% of the entire corpus with an AAGR of 42.41%, followed by studies on mental conditions (AAGR = 39.28%). While electronic health or medical records comprised the highest proportion of text sources (57.12%), omics databases had the highest growth among all text sources with an AAGR of 65.08%. The most common NLP application was clinical decision support (25.45%). CONCLUSIONS: BioBERT showed an acceptable performance in the abstract review. If future research shows the high performance of this language model, it can reliably replace manual abstract reviews.

Biomedical Research , Natural Language Processing , Humans , Language , Data Mining , Electronic Health Records

4.

Agricultural education in Africa using YouTube multilingual animations: A retrospective feasibility study assessing costs to reach language-diverse populations.

Reeves, N Peter; Sal Y Rosas Celi, Victor Giancarlo; Lutomia, Anne N; Medendorp, John William; Bello-Bravo, Julia; Pittendrigh, Barry.

PLoS One ; 19(4): e0302136, 2024.

Article En | MEDLINE | ID: mdl-38635490

There is a critical need for widespread information dissemination of agricultural best practices in Africa. Literacy, language and resource barriers often impede such information dissemination. Culturally and linguistically localized, computer-animated training videos placed on YouTube and promoted through paid advertising is a potential tool to help overcome these barriers. The goal of this study is to assess the feasibility of reaching language-diverse populations in Africa using this new type of information dissemination channel. As a case study, cost estimates were obtained for YouTube ad campaigns of a video to prevent post-harvest loss through safe food storage using sanitized jerrycan containers. Seventy-three video variants were created for the most common 16 languages in Ghana, 35 languages in Kenya, and 22 languages in Nigeria. Using these videos, campaigns were deployed country wide or focused on zones of influence that represent economically underdeveloped regions known to produce beans suitable for jerrycan storage. Using data collected from YouTube ad campaigns, language-specific models were created for each country to estimate how many viewers could be reached per US dollar spent. Separate models were created to estimate the number of viewers who watched 25% and 75% of the video (most of video without end credits), reflecting different levels of engagement. For language campaigns with both country wide and zone of influence areas of deployment, separate region-specific models were created. Models showed that the estimated number of viewers per dollar spent varied considerably amongst countries and languages. On average, the expected number of viewers per dollar spent were 1.8 (Range = 0.2-7.3) for 25% watched and 0.8 (Range = 0.1-3.2) for 75% watched in Ghana, 1.2 (0.2-4.8) for 25% watched and 0.5 (Range = 0.1-2.0) for 75% watched in Kenya, and 0.4 (Range = 0.2-1.3) for 25% watched and 0.2 (Range = 0.1-0.5) for 75% watched in Nigeria. English versions of the video were the most cost-effective in reaching viewers in Ghana and Nigeria. In Kenya, English language campaigns ranked 28 (country wide) and 36 (zones of influence) out of 37 analyzed campaigns. Results also showed that many local language campaigns performed well, opening the possibility that targeted knowledge dissemination on topics of importance to local populations, is potentially cost effective. In addition, such targeted information dissemination appears feasible, even during regional and global crises when in-person training may not be possible. In summary, leveraging multilingual computer-animations and digital platforms such as YouTube shows promise for conducting large-scale agricultural education campaigns. The findings of the current study provides the justification to pursue a more rigorous prospective study to verify the efficacy of knowledge exchange and societal impact through this form of information dissemination channel.

Social Media , Humans , Feasibility Studies , Prospective Studies , Retrospective Studies , Language , Information Dissemination/methods , Ghana , Video Recording

5.

Boosting or inhibiting - how semantic-pragmatic and syntactic cues affect prosodic prominence relations in German.

Baumann, Stefan; Lorenzen, Janne.

PLoS One ; 19(4): e0299746, 2024.

Article En | MEDLINE | ID: mdl-38635575

In this exploratory study, we investigate the influence of several semantic-pragmatic and syntactic factors on prosodic prominence production in German, namely referential and lexical newness/givenness, grammatical role, and position of a referential target word within a sentence. Especially in terms of the probabilistic distribution of accent status (nuclear, prenuclear, deaccentuation) we find evidence for an additive influence of the discourse-related and syntactic cues, with lexical newness and initial sentence position showing the strongest boosting effects on a target word's prosodic prominence. The relative strength of the initial position is found in nearly all prosodic factors investigated, both discrete (such as the choice of accent type) and gradient (e.g., scaling of the Tonal Center of Gravity and intensity). Nevertheless, the differentiation of prominence relations is information-structurally less important in the beginning of an utterance than near the end: The prominence of the final object relative to the surrounding elements, especially the verbal component, is decisive for the interpretation of the sentence. Thus, it seems that a speaker adjusts locally important prominence relations (object vs. verb in sentence-final position) in addition to a more global, rhythmically determined distribution of prosodic prominences across an utterance.

Semantics , Speech Perception , Cues , Language

6.

Language distance and labor market integration of migrants: Gendered perspective.

Bar-Haim, Eyal; Birgier, Debora Pricila.

PLoS One ; 19(4): e0299936, 2024.

Article En | MEDLINE | ID: mdl-38635777

This paper examines the distinct effects of linguistics distance and language literacy on the labor market integration of migrant men and women. Using data from the Programme for International Assessment of Adult Competencies (PIAAC) 2018 in 16 countries of destination mainly from Europe and more than 110 languages of origin, we assess migrant labor force participation, employment, working hours, and occupational prestige. The study finds that linguistics distance of the first language studied has a significant negative association with labor force participation, employment, and working hours of migrant women, even after controlling for their abilities in their destination language, education, and cultural distance between the country of origin and destination. In contrast, linguistics distance is only negatively associated with migrant men's working hours. This suggests that linguistic distance serves as a proxy for cultural aspects, which are not captured by cultural distance and hence shape the labor market integration of migrant women due to cultural factors rather than human capital. We suggest that the gender aspect of the effect of language proximity is essential in understanding the intersectional position of migrant women in the labor force.

Transients and Migrants , Female , Humans , Socioeconomic Factors , Demography , Population Dynamics , Emigration and Immigration , Developed Countries , Developing Countries , Language , Economics

7.

How long is long? Word length effects in reading correspond to minimal graphemic units: An MEG study in Bangla.

Moitra, Swarnendu; Chacón, Dustin A; Stockall, Linnaea.

PLoS One ; 19(4): e0292979, 2024.

Article En | MEDLINE | ID: mdl-38635827

This paper presents a magnetoencephalography (MEG) study on reading in Bangla, an east Indo-Aryan language predominantly written in an abugida script. The study aims to uncover how visual stimuli are processed and mapped onto abstract linguistic representations in the brain. Specifically, we investigate the neural responses that correspond to word length in Bangla, a language with a unique orthography that introduces multiple ways to measure word length. Our results show that MEG signals localised in the anterior left fusiform gyrus, at around 130ms, are highly correlated with word length when measured in terms of the number of minimal graphemic units in the word rather than independent graphemic units (aksar) or phonemes. Our findings suggest that minimal graphemic units could serve as a suitable metric for measuring word length in non-alphabetic orthographies such as Bangla.

Magnetoencephalography , Reading , Language , Linguistics , Brain/physiology

8.

Evaluation of Large Language Model Performance and Reliability for Citations and References in Scholarly Writing: Cross-Disciplinary Study.

Mugaanyi, Joseph; Cai, Liuying; Cheng, Sumei; Lu, Caide; Huang, Jing.

J Med Internet Res ; 26: e52935, 2024 Apr 05.

Article En | MEDLINE | ID: mdl-38578685

BACKGROUND: Large language models (LLMs) have gained prominence since the release of ChatGPT in late 2022. OBJECTIVE: The aim of this study was to assess the accuracy of citations and references generated by ChatGPT (GPT-3.5) in two distinct academic domains: the natural sciences and humanities. METHODS: Two researchers independently prompted ChatGPT to write an introduction section for a manuscript and include citations; they then evaluated the accuracy of the citations and Digital Object Identifiers (DOIs). Results were compared between the two disciplines. RESULTS: Ten topics were included, including 5 in the natural sciences and 5 in the humanities. A total of 102 citations were generated, with 55 in the natural sciences and 47 in the humanities. Among these, 40 citations (72.7%) in the natural sciences and 36 citations (76.6%) in the humanities were confirmed to exist (P=.42). There were significant disparities found in DOI presence in the natural sciences (39/55, 70.9%) and the humanities (18/47, 38.3%), along with significant differences in accuracy between the two disciplines (18/55, 32.7% vs 4/47, 8.5%). DOI hallucination was more prevalent in the humanities (42/55, 89.4%). The Levenshtein distance was significantly higher in the humanities than in the natural sciences, reflecting the lower DOI accuracy. CONCLUSIONS: ChatGPT's performance in generating citations and references varies across disciplines. Differences in DOI standards and disciplinary nuances contribute to performance variations. Researchers should consider the strengths and limitations of artificial intelligence writing tools with respect to citation accuracy. The use of domain-specific models may enhance accuracy.

Artificial Intelligence , Language , Humans , Reproducibility of Results , Research Personnel , Writing

9.

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.

Hadar-Shoval, Dorit; Asraf, Kfir; Mizrachi, Yonathan; Haber, Yuval; Elyoseph, Zohar.

JMIR Ment Health ; 11: e55988, 2024 Apr 09.

Article En | MEDLINE | ID: mdl-38593424

BACKGROUND: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz's theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. OBJECTIVE: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other. METHODS: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire-Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs' value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests. RESULTS: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs' value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs' distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs' responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making. CONCLUSIONS: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.

Allied Health Personnel , Mental Health , Humans , Cross-Sectional Studies , Reproducibility of Results , Language

10.

Pursuance of a Yoruba name for cervical cancer in Southwest Nigeria: a case study.

Balogun, Folusho Mubowale; Omotade, Olayemi.

BMJ Open ; 14(4): e074020, 2024 Apr 23.

Article En | MEDLINE | ID: mdl-38658005

OBJECTIVES: Participants' comprehension of research process affects the quality of research output, which is the reason why translation of research instruments into local languages is standard practice. Literature has consistently reported that in Africa, knowledge about cervical cancer is low but paradoxically, expressed, and actual uptake of human papillomavirus vaccine for its prevention is high. This study explored the Yoruba names of cervical cancer among Yoruba people in Ibadan, Nigeria to guide the translation of cervical cancer research instruments to Yoruba language. DESIGN: Exploratory case study design was used and data were obtained with 10 in-depth interviews and four focused group discussions. Data were analysed using content analysis. SETTINGS: The study took place in Ibadan North local government area, Southwest Nigeria. PARTICIPANTS: These were 4 traditional healers, 3 Yoruba linguists, 3 public health educators and 38 parents of adolescents. MEASURES: These were Yoruba names for cervical cancer and their meanings. RESULTS: Participants were aware of cervical cancer but only the traditional healers and public health educators had names for it. These names were highly varied. The public health educators gave names that were linked with different parts of the female reproductive system and external genital which were actually different medical conditions. Each traditional healer also had different names for cervical cancer, which either described the female body parts, or symptoms of female genital infections. These various names can lead to unnecessary misconceptions and misinformation about cervical cancer, its prevention, management, and research. CONCLUSIONS: There was no consensus Yoruba name for cervical cancer among the study participants. Efforts to educate the Yoruba speaking populace about cervical cancer, its prevention, management and participation in its research can be frustrated if a generally accepted Yoruba name is not provided for this cancer. Stakeholders' collaboration is required to get an appropriate Yoruba name for cervical cancer.

Uterine Cervical Neoplasms , Humans , Female , Uterine Cervical Neoplasms/prevention & control , Nigeria , Adult , Health Knowledge, Attitudes, Practice , Middle Aged , Adolescent , Focus Groups , Terminology as Topic , Language , Medicine, African Traditional

11.

SpEx: a German-language dataset of speech and executive function performance.

Camilleri, Julia A; Volkening, Julia; Heim, Stefan; Mochalski, Lisa N; Neufeld, Hannah; Schlothauer, Natalie; Kuhles, Gianna; Eickhoff, Simon B; Weis, Susanne.

Sci Rep ; 14(1): 9431, 2024 04 24.

Article En | MEDLINE | ID: mdl-38658576

This work presents data from 148 German native speakers (20-55 years of age), who completed several speaking tasks, ranging from formal tests such as word production tests to more ecologically valid spontaneous tasks that were designed to mimic natural speech. This speech data is supplemented by performance measures on several standardised, computer-based executive functioning (EF) tests covering domains of working-memory, cognitive flexibility, inhibition, and attention. The speech and EF data are further complemented by a rich collection of demographic data that documents education level, family status, and physical and psychological well-being. Additionally, the dataset includes information of the participants' hormone levels (cortisol, progesterone, oestradiol, and testosterone) at the time of testing. This dataset is thus a carefully curated, expansive collection of data that spans over different EF domains and includes both formal speaking tests as well as spontaneous speaking tasks, supplemented by valuable phenotypical information. This will thus provide the unique opportunity to perform a variety of analyses in the context of speech, EF, and inter-individual differences, and to our knowledge is the first of its kind in the German language. We refer to this dataset as SpEx since it combines speech and executive functioning data. Researchers interested in conducting exploratory or hypothesis-driven analyses in the field of individual differences in language and executive functioning, are encouraged to request access to this resource. Applicants will then be provided with an encrypted version of the data which can be downloaded.

Executive Function , Speech , Humans , Executive Function/physiology , Adult , Middle Aged , Female , Male , Speech/physiology , Germany , Young Adult , Language , Memory, Short-Term/physiology , Neuropsychological Tests

12.

Evidence of Validity and Reliability for the Spanish Version of the Self-Identified Stage of Recovery.

Sampietro, Hernán María; Barrios, Maite; Berrío, Ángela I; Rojo, J Emilio; Guilera, Georgina; Gómez-Benito, Juana.

Psicothema ; 36(2): 165-173, 2024 05.

Article En | MEDLINE | ID: mdl-38661163

BACKGROUND: The Self-Identified Stage of Recovery (SISR) () is a scale used to assess both the stage of recovery (SISR-A) and the components of the process of personal recovery (SISR-B). This study aimed to develop the Spanish version of the SISR and obtain evidence of validity and reliability in a sample of 230 users of community mental health services. METHOD: The Spanish version of the SISR was developed following the translation-back translation procedure, with the support of a committee of experienced experts. The SISR was examined in terms of dimensional structure, internal consistency, relationships with other variables (i.e., the Maryland Recovery Assessment Scale [MARS-12] and the Dispositional Hope Scale [DHS]), and temporal stability (n = 66). Differential item functioning (DIF) by gender was analysed. RESULTS: The study confirmed the unidimensionality of the SISR-B and suitable internal consistency of its scores (ω = .83, α = .83). Scores from both SISR-A and SISR-B showed good temporal stability and the SISR-B displayed strong correlations with the MARS-12 (rs = .78) and the DHS (rs = .67). No DIF was found. CONCLUSIONS: This study supports the validity and reliability of the scores of the Spanish version of the SISR.

Translations , Humans , Male , Female , Reproducibility of Results , Adult , Middle Aged , Mental Disorders/psychology , Young Adult , Community Mental Health Services , Spain , Language

13.

Psychometric Properties of the Spanish Version of the Parental Feeding Style Questionnaire.

Martínez-Hernández, Irene; Olmos-Soria, Marina; Fonseca-Pedrero, Eduardo; Hidalgo, María D; Valero-García, Ana V.

Psicothema ; 36(2): 184-194, 2024 05.

Article En | MEDLINE | ID: mdl-38661165

BACKGROUND: There are no validated instruments in Spain for measuring parental feeding styles. The aim was to validate the Parental Feeding Styles Questionnaires (PFSQ) in a Spanish sample. METHOD: A total of 523 mothers of 523 school-children participated. The children had a mean age of 4.4 years (SD = 1.3), with 51% being boys (M = 4.3 years, SD = 1.4) and 49% girls (M = 4.5 years, = SD 1.3). The PFSQ and the Comprehensive General Parenting Styles Questionnaire (CGPQ) were used. RESULTS: A model of four correlated factors was identified: Prompting/encouraging eating, emotional feeding, instrumental feeding, and control over eating. Cronbach's alpha for the subscales ranged from 0.64 to 0.86, and McDonald's Omega coefficient ranged from 0.66 to 0.86. Emotional feeding and prompting/encouraging eating had values above 0.70, control over eating had a value of 0.68 and instrumental feeding had an alpha coefficient of 0.64 and omega coefficient of 0.66. The factor structure was similar to the original and to other adapted versions. The Spanish sample used more control over eating and prompting/encouraging to eat. CONCLUSIONS: The adapted PFSQ is a suitable instrument for assessing the feeding styles of Spanish parents.

Feeding Behavior , Parenting , Psychometrics , Humans , Female , Male , Spain , Parenting/psychology , Child, Preschool , Surveys and Questionnaires , Adult , Language

14.

Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals.

Choudhury, Avishek; Chaudhry, Zaira.

J Med Internet Res ; 26: e56764, 2024 Apr 25.

Article En | MEDLINE | ID: mdl-38662419

As the health care industry increasingly embraces large language models (LLMs), understanding the consequence of this integration becomes crucial for maximizing benefits while mitigating potential pitfalls. This paper explores the evolving relationship among clinician trust in LLMs, the transition of data sources from predominantly human-generated to artificial intelligence (AI)-generated content, and the subsequent impact on the performance of LLMs and clinician competence. One of the primary concerns identified in this paper is the LLMs' self-referential learning loops, where AI-generated content feeds into the learning algorithms, threatening the diversity of the data pool, potentially entrenching biases, and reducing the efficacy of LLMs. While theoretical at this stage, this feedback loop poses a significant challenge as the integration of LLMs in health care deepens, emphasizing the need for proactive dialogue and strategic measures to ensure the safe and effective use of LLM technology. Another key takeaway from our investigation is the role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs. The paper highlights how expert users, particularly clinicians, can leverage LLMs to enhance productivity by off-loading routine tasks while maintaining a critical oversight to identify and correct potential inaccuracies in AI-generated content. This balance of trust and skepticism is vital for ensuring that LLMs augment rather than undermine the quality of patient care. We also discuss the risks associated with the deskilling of health care professionals. Frequent reliance on LLMs for critical tasks could result in a decline in health care providers' diagnostic and thinking skills, particularly affecting the training and development of future professionals. The legal and ethical considerations surrounding the deployment of LLMs in health care are also examined. We discuss the medicolegal challenges, including liability in cases of erroneous diagnoses or treatment advice generated by LLMs. The paper references recent legislative efforts, such as The Algorithmic Accountability Act of 2023, as crucial steps toward establishing a framework for the ethical and responsible use of AI-based technologies in health care. In conclusion, this paper advocates for a strategic approach to integrating LLMs into health care. By emphasizing the importance of maintaining clinician expertise, fostering critical engagement with LLM outputs, and navigating the legal and ethical landscape, we can ensure that LLMs serve as valuable tools in enhancing patient care and supporting health care professionals. This approach addresses the immediate challenges posed by integrating LLMs and sets a foundation for their maintainable and responsible use in the future.

Artificial Intelligence , Health Personnel , Trust , Humans , Health Personnel/psychology , Language , Learning

15.

Phonological properties of logographic words modulate brain activation in bilinguals: a comparative study of Chinese characters and Japanese Kanji.

Lin, Zhenglong; Li, Xiujun; Qi, Geqi; Yang, Jiajia; Sun, Hongzan; Guo, Qiyong; Wu, Jinglong; Xu, Min.

Cereb Cortex ; 34(4)2024 Apr 01.

Article En | MEDLINE | ID: mdl-38652552

The brain networks for the first (L1) and second (L2) languages are dynamically formed in the bilingual brain. This study delves into the neural mechanisms associated with logographic-logographic bilingualism, where both languages employ visually complex and conceptually rich logographic scripts. Using functional Magnetic Resonance Imaging, we examined the brain activity of Chinese-Japanese bilinguals and Japanese-Chinese bilinguals as they engaged in rhyming tasks with Chinese characters and Japanese Kanji. Results showed that Japanese-Chinese bilinguals processed both languages using common brain areas, demonstrating an assimilation pattern, whereas Chinese-Japanese bilinguals recruited additional neural regions in the left lateral prefrontal cortex for processing Japanese Kanji, reflecting their accommodation to the higher phonological complexity of L2. In addition, Japanese speakers relied more on the phonological processing route, while Chinese speakers favored visual form analysis for both languages, indicating differing neural strategy preferences between the 2 bilingual groups. Moreover, multivariate pattern analysis demonstrated that, despite the considerable neural overlap, each bilingual group formed distinguishable neural representations for each language. These findings highlight the brain's capacity for neural adaptability and specificity when processing complex logographic languages, enriching our understanding of the neural underpinnings supporting bilingual language processing.

Brain Mapping , Brain , Magnetic Resonance Imaging , Multilingualism , Humans , Male , Female , Young Adult , Brain/physiology , Brain/diagnostic imaging , Adult , Phonetics , Reading , Language , Japan

16.

The Era of ChatGPT and Large Language Models: Can We Advance Patient-centered Communications Appropriately and Safely?

Tu, Wendy.

Radiol Imaging Cancer ; 6(3): e240038, 2024 May.

Article En | MEDLINE | ID: mdl-38668641

Patient-Centered Care , Humans , Communication , Physician-Patient Relations , Language

17.

Evaluation of large language models in breast cancer clinical scenarios: a comparative analysis based on ChatGPT-3.5, ChatGPT-4.0, and Claude2.

Deng, Linfang; Wang, Tianyi; Zhai, Zhenhua; Tao, Wei; Li, Jincheng; Zhao, Yi; Luo, Shaoting; Xu, Jinjiang.

Int J Surg ; 110(4): 1941-1950, 2024 Apr 01.

Article En | MEDLINE | ID: mdl-38668655

BACKGROUND: Large language models (LLMs) have garnered significant attention in the AI domain owing to their exemplary context recognition and response capabilities. However, the potential of LLMs in specific clinical scenarios, particularly in breast cancer diagnosis, treatment, and care, has not been fully explored. This study aimed to compare the performances of three major LLMs in the clinical context of breast cancer. METHODS: In this study, clinical scenarios designed specifically for breast cancer were segmented into five pivotal domains (nine cases): assessment and diagnosis, treatment decision-making, postoperative care, psychosocial support, and prognosis and rehabilitation. The LLMs were used to generate feedback for various queries related to these domains. For each scenario, a panel of five breast cancer specialists, each with over a decade of experience, evaluated the feedback from LLMs. They assessed feedback concerning LLMs in terms of their quality, relevance, and applicability. RESULTS: There was a moderate level of agreement among the raters (Fleiss' kappa=0.345, P<0.05). Comparing the performance of different models regarding response length, GPT-4.0 and GPT-3.5 provided relatively longer feedback than Claude2. Furthermore, across the nine case analyses, GPT-4.0 significantly outperformed the other two models in average quality, relevance, and applicability. Within the five clinical areas, GPT-4.0 markedly surpassed GPT-3.5 in the quality of the other four areas and scored higher than Claude2 in tasks related to psychosocial support and treatment decision-making. CONCLUSION: This study revealed that in the realm of clinical applications for breast cancer, GPT-4.0 showcases not only superiority in terms of quality and relevance but also demonstrates exceptional capability in applicability, especially when compared to GPT-3.5. Relative to Claude2, GPT-4.0 holds advantages in specific domains. With the expanding use of LLMs in the clinical field, ongoing optimization and rigorous accuracy assessments are paramount.

Breast Neoplasms , Humans , Female , Clinical Decision-Making , Language

18.

How children generalize novel nouns: An eye-tracking analysis of their generalization strategies.

Stansbury, Eleanor; Witt, Arnaud; Bard, Patrick; Thibaut, Jean-Pierre.

PLoS One ; 19(4): e0296841, 2024.

Article En | MEDLINE | ID: mdl-38568960

Recent research has shown that comparisons of multiple learning stimuli which are associated with the same novel noun favor taxonomic generalization of this noun. These findings contrast with single-stimulus learning in which children follow so-called lexical biases. However, little is known about the underlying search strategies. The present experiment provides an eye-tracking analysis of search strategies during novel word learning in a comparison design. We manipulated both the conceptual distance between the two learning items, i.e., children saw examples which were associated with a noun (e.g., the two learning items were either two bracelets in a "close" comparison condition or a bracelet and a watch in a "far" comparison condition), and the conceptual distance between the learning items and the taxonomically related items in the generalization options (e.g., the taxonomic generalization answer; a pendant, a near generalization item; versus a bow tie, a distant generalization item). We tested 5-, 6- and 8-year-old children's taxonomic (versus perceptual and thematic) generalization of novel names for objects. The search patterns showed that participants first focused on the learning items and then compared them with each of the possible choices. They also spent less time comparing the various options with one another; this search profile remained stable across age groups. Data also revealed that early comparisons, (i.e., reflecting alignment strategies) predicted generalization performance. We discuss four search strategies as well as the effect of age and conceptual distance on these strategies.

Eye-Tracking Technology , Vocabulary , Child , Humans , Language , Learning , Generalization, Psychological

19.

Evaluation of the safety, accuracy, and helpfulness of the GPT-4.0 Large Language Model in neurosurgery.

Huang, Kevin T; Mehta, Neel H; Gupta, Saksham; See, Alfred P; Arnaout, Omar.

J Clin Neurosci ; 123: 151-156, 2024 May.

Article En | MEDLINE | ID: mdl-38574687

BACKGROUND: Although prior work demonstrated the surprising accuracy of Large Language Models (LLMs) on neurosurgery board-style questions, their use in day-to-day clinical situations warrants further investigation. This study assessed GPT-4.0's responses to common clinical questions across various subspecialties of neurosurgery. METHODS: A panel of attending neurosurgeons formulated 35 general neurosurgical questions spanning neuro-oncology, spine, vascular, functional, pediatrics, and trauma. All questions were input into GPT-4.0 with a prespecified, standard prompt. Responses were evaluated by two attending neurosurgeons, each on a standardized scale for accuracy, safety, and helpfulness. Citations were indexed and evaluated against identifiable database references. RESULTS: GPT-4.0 responses were consistent with current medical guidelines and accounted for recent advances in the field 92.8 % and 78.6 % of the time respectively. Neurosurgeons reported GPT-4.0 responses providing unrealistic information or potentially risky information 14.3 % and 7.1 % of the time respectively. Assessed on 5-point scales, responses suggested that GPT-4.0 was clinically useful (4.0 ± 0.6), relevant (4.7 ± 0.3), and coherent (4.9 ± 0.2). The depth of clinical responses varied (3.7 ± 0.6), and "red flag" symptoms were missed 7.1 % of the time. Moreover, GPT-4.0 cited 86 references (2.46 citations per answer), of which only 50 % were deemed valid, and 77.1 % of responses contained at least one inappropriate citation. CONCLUSION: Current general LLM technology can offer generally accurate, safe, and helpful neurosurgical information, but may not fully evaluate medical literature or recent field advances. Citation generation and usage remains unreliable. As this technology becomes more ubiquitous, clinicians will need to exercise caution when dealing with it in practice.

Neurosurgeons , Neurosurgery , Humans , Neurosurgery/methods , Neurosurgery/standards , Neurosurgeons/standards , Neurosurgical Procedures/methods , Neurosurgical Procedures/standards , Language

20.

Medical articles should include a plain language summary.

Seghier, Mohamed L.

Lancet ; 403(10436): 1539-1540, 2024 Apr 20.

Article En | MEDLINE | ID: mdl-38642950

Language , Humans